Multilingual Information Extraction with PolyglotIE
نویسندگان
چکیده
We present POLYGLOTIE, a web-based tool for developing extractors that perform Information Extraction (IE) over multilingual data. Our tool has two core features: First, it allows users to develop extractors against a unified abstraction that is shared across a large set of natural languages. This means that an extractor needs only be created once for one language, but will then run on multilingual data without any additional effort or language-specific knowledge on part of the user. Second, it embeds this abstraction as a set of views within a declarative IE system, allowing users to quickly create extractors using a mature IE query language. We present POLYGLOTIE as a hands-on demo in which users can experiment with creating extractors, execute them on multilingual text and inspect extraction results. Using the UI, we discuss the challenges and potential of using unified, crosslingual semantic abstractions as basis for downstream applications. We demonstrate multilingual IE for 9 languages from 4 different language groups: English, German, French, Spanish, Japanese, Chinese, Arabic, Russian and Hindi.
منابع مشابه
Modern Multilingual and Cross-lingual Information Access Technologies
In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...
متن کاملGATE: A Unicode-based Infrastructure Supporting Multilingual Information Extraction
NLP infrastructures with comprehensive multilingual support can substantially decrease the overhead of developing Information Extraction (IE) systems in new languages by offering support for different character encodings, languageindependent components, and clean separation between linguistic data and the algorithms that use it. This paper will present GATE – a Unicode-aware infrastructure that...
متن کاملMultilingual Extraction Ontologies
The growth of multilingual web content and increasing internationalization portends the need for cross-language query processing. We offer ML-OntoES (a MultiLingual Ontology-based Extraction System) as a solution for narrowdomain/data-rich applications. Based on language-independent extraction ontologies (Embley, Liddle, & Lonsdale, 2011), ML-OntoES enables semantic search over domain-specific,...
متن کاملMultilingual Ontologies and English- Bulgarian Ontology Development
In this paper we make a short survey of the approaches for development of multilingual ontologies. Our main goal is to find appropriate approach for development of multilingual ontologies, including Bulgarian language terminology. We propose a collaborative methodology for development of English-Bulgarian bilingual ontologies by usage of information extraction from e-learning textual content, l...
متن کاملExtraction of Multilingual Term Variants in the Business Reporting Domain
Within the context of the European research project ”Monnet”, which implements among other activities ontology-based multilingual information extraction, we tackle the the issue of recognizing variants of concept labels in business reports that guide the information extraction process. In this short paper, we describe two related experiments in finding variants of multilingual taxonomy labels u...
متن کامل